Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | Go Back
View Post [edit]

Poster: pabouk Date: Aug 12, 2008 5:51am
Forum: web Subject: The archive retruns wrong Content-type HTTP header

Hello,

sorry for posting this again but it seems that I posted the message to a wrong forum and I got no reply.

Several times I have seen that the web archive returns bad "Content-type" HTTP header with wrong character set. Examples:

$ wget -S http://web.archive.org/web/20030524081559/http://www.iriverjapan.com/product.php?product=iHP-100
--2008-07-28 15:23:00
-- http://web.archive.org/web/20030524081559/http://www.iriverjapan.com/product.php?product=iHP-100
Resolving web.archive.org... 207.241.227.154
Connecting to web.archive.org|207.241.227.154|:80...
connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Date: Mon, 28 Jul 2008 13:23:01 GMT
Server: Apache/2.2.4 (Ubuntu) PHP/5.2.3-1ubuntu6 mod_perl/2.0.2 Perl/v5.8.8
X-Powered-By: PHP/4.2.3
Content-Type: text/html; charset=UTF-8
Connection: close
Length: unspecified [text/html]

$ wget -S http://web.archive.org/web/20050518010425/http://www.didaktik.cz/pocitace_didaktik/didaktik_8.htm
--2008-07-28 15:33:53
-- http://web.archive.org/web/20050518010425/http://www.didaktik.cz/pocitace_didaktik/didaktik_8.htm
Resolving web.archive.org... 207.241.227.154
Connecting to web.archive.org|207.241.227.154|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Date: Mon, 28 Jul 2008 13:33:54 GMT
Server: Apache/2.2.4 (Ubuntu) PHP/5.2.3-1ubuntu6 mod_perl/2.0.2 Perl/v5.8.8
Accept-Ranges: bytes
ETag: "306df66a864ec51:1695"
Last-Modified: Sun, 01 May 2005 19:46:07 GMT
Content-Length: 8750
Content-Type: text/html; charset=UTF-8
Connection: close
Length: 8750 (8.5K) [text/html]

In the first case the archive returns "Content-Type: text/html; charset=UTF-8" although the archived page is in "x-sjis" charset as it is indicated in the meta headers,

In the second case the archive returns "Content-Type: text/html; charset=UTF-8" again! (Is not it always?) Although the page is in "Windows-1250" charset although it is not indicated by meta headers.

In both cases better result would be acquired by omitting the "charset=UTF-8" part. Do you please know why the archive wrongly asserts the utf-8 character set? Unfortunately the HTTP header has the highest priority. Does the archive store the original HTTP headers?

Reply to this post
Reply [edit]

Poster: pabouk Date: Oct 24, 2008 2:40am
Forum: web Subject: Re: The archive retruns wrong Content-type HTTP header

Will anyone reply please?

How are we supposed to report bugs?